Latent-Domain Predictive Neural Speech Coding
نویسندگان
چکیده
Neural audio/speech coding has recently demonstrated its capability to deliver high quality at much lower bitrates than traditional methods. However, existing neural codecs employ either acoustic features or learned blind with a convolutional network for encoding, by which there are still temporal redundancies within encoded features. This paper introduces latent-domain predictive into the VQ-VAE framework fully remove such and proposes TF-Codec low-latency speech in an end-to-end manner. Specifically, extracted conditioned on prediction from past quantized latent frames so that correlations further removed. Moreover, we introduce learnable compression time-frequency input adaptively adjust attention paid main frequencies details different bitrates. A differentiable vector quantization scheme based distance-to-soft mapping Gumbel-Softmax is proposed better model distributions rate constraint. Subjective results multilingual datasets show that, low latency, 1 kbps achieves significantly Opus 9 kbps, 3 outperforms both EVS 9.6 12 kbps. Numerous studies conducted demonstrate effectiveness of these techniques.
منابع مشابه
Neural Elements for Predictive Coding
Predictive coding theories of sensory brain function interpret the hierarchical construction of the cerebral cortex as a Bayesian, generative model capable of predicting the sensory data consistent with any given percept. Predictions are fed backward in the hierarchy and reciprocated by prediction error in the forward direction, acting to modify the representation of the outside world at increa...
متن کاملFrequency Domain Coding of Speech
Frequency domain techniques for speech coding have recently received considerable attention. The basic concept of these methods is to divide the speech into frequency components by a filter bank (sub-band coding), or by a suitable transform (transform coding), and then encode them using adaptive PCM. Three basic factors are involved in the design of these coders: 1) the type of the filter bank ...
متن کاملSparsity in Linear Predictive Coding of Speech
This thesis deals with developing improved techniques for speech coding based on the recent developments in sparse signal representation. In particular, this work is motivated by the need to address some of the limitations of the wellknown linear prediction (LP) model currently applied in many modern speech coders. In the first part of the thesis, we provide an overview of Sparse Linear Predict...
متن کاملSpeech Compression Using Linear Predictive Coding(lpc)
One of the most powerful speech analysis techniques is the method of linear predictive analysis. This method has become the predominant technique for representing speech for low bit rate transmission or storage. The importance of this method lies both in its ability to provide extremely accurate estimates of the speech parameters and in its relative speed of computation. The basic idea behind l...
متن کاملSpeech Compression Using Linear Predictive Coding
The aim of the project is to develop a system for encoding good quality speech at a low bit rate. To implement this we have used most powerful speech analysis technique called Linear Predictive Coding (LPC). It uses 10 order Levinson-Durbin Recursion algorithm to accomplish the task. It provides extremely accurate estimates of speech parameters, and is relatively efficient for computation.The s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing
سال: 2023
ISSN: ['2329-9304', '2329-9290']
DOI: https://doi.org/10.1109/taslp.2023.3277693